Efficient parallel coordinate descent algorithm for 
convex optimization problems with separable 
constraints: application to distributed MPC^ 

Ion Necoara'', Dragos Clipici^ 

'^University Politehnica Bucharest, Automatic Control and Systems Engineering 
Department, 060042 Bucharest, Romania 
(e-mail: ion.necoara@acse.pub.ro, d.clipici@acse.puh.ro) 



Abstract 

In this paper we propose a parallel coordinate descent algorithm for solving 
smooth convex optimization problems with separable constraints that may arise 
e.g. in distributed model predictive control (MPC) for linear network systems. 
Our algorithm is based on block coordinate descent updates in parallel and 
has a very simple iteration. We prove (sub) linear rate of convergence for the 
new algorithm under standard assumptions for smooth convex optimization. 
Further, our algorithm uses local information and thus is suitable for distributed 
implementations. Moreover, it has low iteration complexity, which makes it 
appropriate for embedded control. An MPC scheme based on this new parallel 
algorithm is derived, for which every subsystem in the network can compute 
feasible and stabilizing control inputs using distributed and cheap computations. 
For ensuring stability of the MPC scheme, we use a terminal cost formulation 
derived from a distributed synthesis. Preliminary numerical tests show better 
performance for our optimization algorithm than other existing methods. 
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convergence rate, distributed model predictive control, embedded control. 



*The research leading to these results has received funding from: the European Union, Sev- 
enth Framework Programme (FP7/2007-2013) under grant agreement no 248940; CNCSIS- 
UEFISCSU (project TE, no. 19/11.08.2010); ANCS (project PN II, no. 80EU/2010); Sec- 
toral Operational Programme Human Resources Development 2007-2013 of the Romanian 
Ministry of Labor, Family and Social Protection through the Financial Agreement POS- 
DRU/89/1.5/S/62557. 



Preprint submitted to Elsevier 



February U, 2013 



1. Introduction 



Model predictive control (MPC) has become a popular advanced control 
technology implemented in network systems due to its ability to handle hard in- 
put and state constraints [2^ • Network systems are usually modeled by a graph 
whose nodes represent subsystems and whose arcs indicate dynamic couplings. 
These types of systems are complex and large in dimension, whose structures 
may be hierarchical and they have multiple decision-makers (e.g. process control 
21 1, traffic and power systems 0,0]' flight formation [l^). 

Decomposition methods represent a very powerful tool for solving distributed 
MPC problems in network systems. The basic idea of these methods is to 
decompose the original large optimization problem into smaller subproblems. 
Decomposition methods can be divided in two main classes: primal and dual 
decomposition methods. In primal decomposition the optimization problem is 
solved using the original formulation and variables via methods such as interior- 
point, feasible directions, Gauss- Jacobi type and others 



251. In dual 



decomposition the original problem is rewritten using Lagrangian relaxation 
for the coupling constraints and the dual problem is solved with a Newton or 



(sub)gradient algorithm Il6l . |15|. In [23|, |25| cooperative based dis 



tributed MPC algorithms are proposed based on Gauss- Jacobi iterations, where 
asymptotic convergence to the centralized solution and feasibility for their iter- 
ates is proved. In Q non-cooperative algorithms are derived for distributed 
MPC problems, where communication takes place only between neighbors. In 
Q] a distributed algorithm based on interior-point methods is proposed whose 
iterates converge to the centralized solution. In Q dual distributed 

gradient algorithms based on Lagrange relaxation of the coupling constraints are 
presented for solving MPC problems, algorithms which usually produce feasible 
and optimal primal solutions in the limit. While much research has focused on 
a dual approach, our work develops a primal method that ensures constraint 
feasibility, has low iteration complexity and provides estimates on suboptimality. 
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Further, MFC schemes tend to be quite costly computation-wise compared 
with classical control methods, e.g. FID controllers, so that for these advanced 
schemes we need hardware with a reasonable amount of computational power 
that is embedded on the subsystems. Therefore, research for distributed and 
embedded MFC has gained momentum in the past fcw years. The concept 
behind embedded MFC is designing a control scheme that can be implemented 
on autonomous electronic hardware, e.g programmable logic controllers (FLC) 



24| or field-programmable gate arrays (FFGAs) Q. Such devices vary widely 



in both computational power and memory storage capabilities as well as cost. 
As a result, there has been a growing focus on making MFC schemes faster 
by reducing problem size and improving the computational efficiency through 
decentralization 21|, moving block strategies (e.g. by using latent variables 



or Laguerre functions 



27| ) and other procedures, allowing these schemes to be 



implemented on cheaper hardware with little computational power. 

The main contribution of this paper is the development of a parallel coordi- 
nate descent algorithm for smooth convex optimization problems with separable 
constraints that is computationally efficient and thus suitable for MFC schemes 
that need to be implemented distributivcly or in hardware with limited compu- 
tational power. This algorithm employs parallel block-coordinate updates for 
the optimization variables and has similarities to the optimization algorithm 
proposed in [23|, but with simpler implementation, lower iteration complexity 



and guaranteed rate of convergence. We derive (sub)linear rate of convergence 
for the new algorithm whose proof relies on the Lipschitz property of the gra- 
dient of the objective function. The new parallel algorithm is used for solving 
MFC problems for general linear network systems in a distributed fashion using 
local information. For ensuring stability of the MFC scheme, we use a terminal 
cost formulation derived from a distributed synthesis and we eliminate the need 
for a terminal state constraint. Compared with the existing approaches based 
on an end point constraint, we reduce the conservatism by combining the under- 
lying structure of the system with distributed optimization 1191 1. Because 
the MFC optimization problem is usually terminated before convergence, our 
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MPC controller is a form of suboptimal control. However, using the theory of 
suboptimal control 2M '^^^ ^^i^l guarantee feasibility and stability. 

This paper is organized as follows. In Section [2] we derive our parallel co- 
ordinate descent optimization algorithm and prove the convergence rate for it. 
In Sections 13. m3. 21 we introduce the model for general network systems, present 
the MPC problem with a terminal cost formulation and provide the means for 
which this terminal cost can be synthesized distributively. In Sections l3.3ll3.4l we 
employ our algorithm for distributively solving MPC problems arising from net- 
work systems and discuss details regarding its implementation. In Section |4] we 
compare its performance with other algorithms and test it on a real application 
- a quadruple water tank process. 

2. A parallel coordinate descent algorithm for smooth convex prob- 
lems with separable constraints 

We work in R" composed by column vectors. For u,v G R" we denote the 
standard Euclidean inner product = u^v, the Euclidean norm ||u|| = 



•\/(?Vu) and \\x\\p = Px. Further, for a symmetric matrix P, we use P >■ 
(P ^ 0) for a positive (semi) definite matrix. For matrices P and Q, we use 
diag(P, Q) to denote the block diagonal matrix formed by these two matrices. 

In this section we propose a parallel coordinate descent based algorithm for 
efficiently solving the general convex optimization problem of the following form: 

r- min /(u\...,u*0, (1) 

where G R"" with i = 1, . . . ,M, are the decision variables, constrained to 
individual convex sets U' C R"". We gather the individual constraint sets 
into the set U = x • ■ • x U^^, and denote the entire decision variable for 
(P) by u = [(u^)^ . . . (u'*^)^]'^ e R"", with nu = YJILi K- As we will show in 
this section, the new algorithm can be used on many parallel computing archi- 
tectures, has low computational cost per iteration and guaranteed convergence 
rate. We will then apply this algorithm for solving distributed MPC problems 
arising in network systems in Section [3l 
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2.1. Parallel Block-Coordinate Descent Method 

Let us partition the identity matrix in accordance with the structure of the 
decision variable u: 

where i?' £ for all i = 1, ■ ■ ■ , M. With matrices we can represent 

u = X^t^i -E'^u*. We also define the partial gradient Vi/(u) E M"" of /(u) as: 
Vi/(u) = (i?*)-^V/(u). We assume that the gradient of / is coordinate-wise 
Lipschitz continuous with constants Li > 0, i.e: 

\\VJ{u + E'h,) - VJ(u)|| < L, \\h4 Vu e M"", h, e R<. (2) 

Due to the assumption that / is coordinate-wise Lipschitz continuous, it can be 



easily deduced that 



17|: 



f{u + E'h,)<f{u) + {V,f{u),h,) + ^\\h,f VugM"", /i, gR<. (3) 
We now introduce the following norm for the extended space M"": 

M 

||u||? = ^L,||u'f , (4) 

i=l 

which will prove useful for estimating the rate of convergence for our algorithm. 
Additionally, if function / is smooth and strongly convex with regards to \\-\\^ 
with a parameter cti , then jl8| : 

/(w) > /(v) + (V/(v), w - v) + ^ llw - v|l^ Vw, V e M"". (5) 

Note that if / is strongly convex w.r.t the standard Euclidean norm ||-|| with 
a parameter ctq, then cto > aiLl^^^, where ij,iax — niaxi,;. By taking w = 

i 

V -t- E'^hi and v = u in ([5]) we also get: 

f{u + E'h,)>f{u) + {VJ{u),h,) + ^\\h,f VueR"",/i, 
and combining with ^ we also deduce that <ti < 1. 
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We now define the constrained coordinate update for our algoritlim: 

L 2 

v'(u) = arg min (Vj/(u), v' - u*) + llv' - u*|| 
u*(u) = u + £;*(v'(u) -u*), i = 1,...,M. 

The optimality conditions for the previous optimization problem are: 

(V,/(u) +L,(v*(u) -u*),v* - v''(u)) > Vv^ e U\ (6) 

Taking v' ~ u' in the previous inequality and combining with ([3]) we obtain the 
following decrease in the objective function: 

/(u)-/(u'(u))>:|||v^(u)-u'f . (7) 

We now present our Parallel Coordinate Descent Method, that resembles 
the method in [2^ but with simpler implementation, lower iteration complexity 
and guaranteed rate of convergence, and is a parallel version of the coordinate 
descent method from \X'i 



Algorithm PCDM 

Choose uj, e for all z = 1, . . . , M. For fc > 0: 

1. Compute in parallel v'(ufc), i = 1, . . . , M. 

2. Update in parallel: 

Ufe+1 = ^^7^ + i = i,...,M. 



Note that if the sets are simple (by simple we understand that the pro- 
jection on these sets is easy), then computing v'(u) consists of projecting a 
vector on these sets and can be done numerically very efficient. For exam- 
ple, if these sets are simple box sets, i.e U' = |u* e ]R""|u5„j„ < u* < uj^^^j, 
then the complexity of computing v*(u), once Vi/(u) is available, is ©(njj). 
In turn, computing Vi/(u) has, in the worst case, complexity 0(njjr7,u) for 
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quadratic dense functions. Thus, Algorithm PCDM has usually a very low iter- 
ation cost per subsystem compared to other existing methods, e.g. Jacobi type 
algorithm presented in |23j , which usually require numerical complexity at least 



^(("■u)'^ + "-u^-u) per iteration for each subsystem i, provided that the local 
quadratic problems are solved with an interior point solver. Also, in the follow- 
ing two theorems we provide estimates for the convergence rate of our algorithm, 
while for the algorithm in only asymptotic convergence is proved. 

From (O-©, convexity of / and u^+i = --ju'(ufe) we see immediately 
that method PCDM decreases strictly the objective function at each iteration, 
provided that 7^ u,, where u, is the optimal solution of i.e.: 



/(ufc+i) < /(ufc) Vfc > 0, Ufe ^ 



(8) 



Let /* be the optimal value in optimization problem ((Ij. The following 
theorem derives convergence rate of Algorithm PCDM and employs standard 



techniques for proving rate of convergence of the gradient method 



Theorem 1. If function f in optimization problem ([TJ has a coordinate-wise 
Lipschitz continuous gradient with constants Li as given in ([2]), then Algorithm 
PCDM has the following suhlinear rate of convergence: 



f{nk)-f* < 



M 



M 



ir2 + /(uo)-.r 



where = ||uo — u. 



Proof. We introduce the following term: 

AI 



rl 



Ufe-U* 



i=l 



where u* is the optimal solution of ([T]) and u* = (i?')"^u*. Then, using similar 



derivations as in 

M 



ITf, we have: 



'k+ 



i=i 



^ri+f:-(--^)\\nuk) 

i=i 



M 



(v,/K-),u:-v*(ufc)) 



7 



i=l 

(VJK),vXufc) -u^,) + (V,/(ufc),<-uI.)). 
By convexity of / and ([3]) we obtain: 

< r2-2(/(ufc+i) - /K)) + A (V/K), u, - u,.) (9) 
and adding up these inequalities we get: 

ir^ + /(uo)-r>ir^+i+/(u,+i)-r+i^^(/(u,)-.r) 

>/(u,+i)-r + l:^(/K)-r). 

Taking into account that our algorithm is a descent algorithm, i.e. /(uj) > 
/(ufc+i) for all j < k and by the previous inequality the proof is complete. □ 

Now, we derive linear convergence rate for Algorithm PCDM, provided that 
/ is additionally strongly convex: 

Theorem 2. Under the assumptions of Theorem [7] and if we further assume 
that f is strongly convex with regards to \\-\\^ with a constant u\ as given in ([5|), 
then the following linear rate of convergence is achieved for Algorithm PCDM: 

^("''-^•<'-WTi;n)"(r"+«""'-'")' 

Proof. We take w = u* and v = Ufc in ([Sj and through ^ we get: 

ir^^i + /(Ufc+i) - /* <^rl + /(u,) - /* - jjifiu,) f * + ^ri). (10) 

From the strong convexity of / in ([S]) we also get: 

f{v^k)-f*^^-^rl>a,rl. 

We now define 7 = £ [0, 1] and using the previous inequality we obtain 

the following result: 

/(Ufc) - /* + ^rl >7 (/(Ufc) - /* + '^rf) + (1 - ^)a,rl 
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Using this inequality in ([TU)) we get: 

ir2^i + /(u,+i) _ /* < (l - ^) (^Irl + /(u,) - . 
Applying this inequality iteratively, we obtain the following for fc > 0: 

Ki+f{u,)-r<[i-^y Qr^ + /(uo) - r) , 

and by replacing 7 = complete the proof. □ 

The following properties follow immediately for our Algorithm PCDM. 

Lemma 1. For the optimization problem ([1]) , with the assumptions of Theorem 

we have the following statements: 
(i) Given any feasible initial guess Uq, the iterates of the Algorithm PCDM are 
feasible at each iteration, i.e. G U* for all k >0. 

(a) The function f is nonincreasing, i.e. f{\ik+i) < f{u.k) according to ([5]). 
(Hi) The sub(linear) rate of convergence of Algorithm PCDM is given in Theo- 
rem\^ (Theorem\^. 

3. Application of Algorithm PCDM to distributed suboptimal MPC 

The Algorithm PCDM can be used to solve distributively input constrained 
MPC problems for network systems after state elimination. In this section we 
show that the MPC scheme obtained by solving approximately the correspond- 
ing optimization problem with Algorithm PCDM is stable and distributed. 

3.1. MPC for network systems with terminal cost and without end constraints 

In this paper we consider discrete-time network systems, which are usually 
modeled by a graph whose nodes represent subsystems and whose arcs indicate 
dynamic couplings, defined by the following linear state equations 

- ^''^t + ' » = 1, ■ • • , M, (11) 

where M denotes the number of interconnected subsystems, G and 
ul £ ]R™J represent the state and respectively the input of jth subsystem at 



9 



time t, A'^ e ]R".xn, ^ ^ij ^ jgn.xm, j^i ^^vc set of indices which contains 
the index i and that of its neighboring subsystems. A particular case of 



that is frequently found in literature 



16 



23 



25|, has the following dynamics: 



For stability analysis, we also express the dynamics of the entire system: xt+i = 

M M 

Axt + But, where n = ^ ti,, to = ^ m,, e M", Ut G M" and A £ M"^", 

i=l i=l 

B G K"^™. For system ([TI]) or we consider local input constraints: 

G [/* i = I,-- - t > 0, (13) 

with [/* C M™» compact, convex sets with the origin in their interior. We also 
consider convex local stage and terminal costs for each subsystem i: €'(x*,u') 
and i\{x^). Let us denote the input trajectory for subsystem i and the overall 
input trajectory for the entire system by: 

= [(«^)^ . . . (ui,_,)T, u = [(uT • • • (u^TF. 

We can now formulate the MFC problem for system (|lip over a prediction 
horizon of length and a given initial state x as [2^ : 

(M N-l \ 
tit^ J 

s.t: =Y^ A'^xi + B'^ul, xi^ = x\ i =!,■■■ ,M, t>Q. 

It is well-known that by eliminating the states using dynamics PT|) . the 
MFC problem can be recast [2^ as a convex optimization problem of type 
([1]), where = Nnii, the function / is convex (recall that we assume the 
stage and final costs €"{■) and t\{-) to be convex), whilst the convex sets U' 
are the Cartesian product of the convex sets J7' for N times. We denote the 
approximate solution produced by Algorithm FCDM for problem ([H)) after 
certain number of iterations with u*-"^. We also consider that at each MFC 
step the Algorithm FCDM is initialized (warm start) with the shifted sequence 
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of controllers obtained at the previous step and the feedback controller k(-) 
computed in Section [321 below. The suboptimal MPC scheme corresponding to 
would now be: 



Suboptimal MPC scheme 



Given initial state x and initial repeat: 

1. Recast MPC problem dH]) as opt. problem ([T]) 

2. Solve dl]) approximately vifith Alg. PCDM 
starting from u*-''-* and obtain u*-''-* 

3. Update x. Update u*-''-* using warm start. 



3.2. Distributed synthesis for a terminal cost 

We assume that stability of the MPC scheme (fT4|) is enforced by adapting 
the terminal cost £f(-) = X)fii^f(') ^^^"^ ^^e horizon length appropriately 
such that sufficient stability criteria are fulfilled fl|l2|,Q. Usually, stability of 

II ' 1 1 2 II "112 

MPC with quadratic stage cost u') = |f* Hq; + ||'"* > where the matrices 
^ and >~ 0, and without terminal constraint is enforced if the following 
criteria hold: there exists a neighborhood of the origin Vt C M", a stabilizing 
feedback law k(-) and a terminal cost €f(-) such that we have 



where the matrices Q and R have a block diagonal structure and are composed 
of the blocks and i?*, respectively. As shown in flQ,Q, MPC schemes 
based on the condition (fT5|) are usually less conservative than schemes based 
on end point constraint. Keeping in line with the distributed nature of our 
system, the control law k(-) and the final stage cost £t(-) need to be computed 
locally. In this section we develop a distributed synthesis procedure to construct 
them locally. We choose the terminal cost for each subsystem i to be quadratic: 

1 1 ■ 1 1 2 

(.\{x\) = ||x5y||pi, where >- 0. For a locally computed k(-), we employ 




(15) 



11 



distributed control laws: = F'a;*, i.e. k(-) is taken linear with a block- 
diagonal structure. Centralized LMI formulations of ([T5|) for quadratic terminal 
costs arc well-known in the literature [2^. However, our goal is to solve (|15p 
distributively. To this purpose, we first need to introduce vectors x-^ £ W^u^ 
and S R"'-vi for subsystem i, where rif^i = rij and mj^fi = rrij. 

These vectors are comprised of the state and input vectors of subsystem i and 



those of its neighbors: x 



.AT' 



T 



T 



Since our synthesis procedure needs to be distributed and taking into account 
that ^f(-) = y::_] ei(-), we impose the following distributed structure to ensure 
(|15p (see also for a similar approach where infinity-norm control Lyapunov 
functions arc synthesized in a decentralized fashion by solving linear programs 
for each subsystem) for i = 1 , • • • , Af : 

tfiix')+)-ef{ix')) + {F'x'fR'F'x' + ixYQ'x' < q\x^') Vx^'e R"aa. (16) 

such that q{x) = X]f=i9*(^^') — 0- We assume that (7*(x^') also have a 



quadratic form, with q''(x^ ) 



x^' 



2 



, where W^' € R'V'X'V-. Being 



a sum of quadratic fimctions, q(x) can itself be expressed as a quadratic func- 



tion, q{x) = where W € R"^" is formed from the appropriate block 



components of matrices W"^ . Note that we do not require that matrices W''^ 
be negative semidefinite. On the contrary, positive or indefinite matrices al- 
low local terminal costs to increase so long as the global cost still decreases. 
This approach reduces the conservatism in deriving the matrices and f . 
For obtaining and F\ we introduce matrices E^^ e R"'^", e 
J^^ e R"Ar.x", J^' e ]R"V.x™ such that x' = Eix, u' ^ E',^u, x^' = J:^' x 
and u-^' = J:^,'u. We now define the matrices A^' = E^AiJ:^')^, B-^' = 
EIB{J^^Y and F^^ = J^''F{J^^Y, as to express the dynamics (HH) for sub- 
system i: x\j^i ~ (A"^' -f- _B'^'P^')x^\ Using these notations we can now 
recast inequality ([T6| as: 



[A^^ + B^'F^^fP'iA^^ + B^'f^^) (17) 
- jf{El,f{P' + + {F'fR'F')El^{jf f ^ 
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The task of finding suitable P\ and W^' matrices is now reduced to the 
following optimization problem: 



min {S : MI i = 1, ■ • • , M, W < 61}. 



(18) 



It can be easily observed that if the optimal value 6* < 0, consequently < 
and (jisp holds. This optimization problem, in its current nonconvex form, 
cannot be solved efficiently. However, it can be recast as a sparse SDP if we 
can reformulate P7)) as an LMI. We need now to make the assumption that all 
the subsystems have the same dimension for the states, i.e. n,; = nj for all 
Subsequently, we introduce the well-known linearizations: — (S"*)^^, = 
y G"^ and a series of matrices that will be of aid in formulating the LMIs: 

G^' /|_^.| ® G, G-^'\< ^ [0 /|^.|_^ ® G], S^' = diag(5\A.,/(„^,_„,)) 





^^'G-^'+s^'y^" 




(QO^G 








{R^)W^ 



where the blocks are of appropriate dimensionj^. 
Lemma 2. // the following SDP: 



min 6 



(19) 



s.t: 



1 



>■ 



yij = \fj eM\ i = 1, ■ • • , Af, r< 
has an optimal value 6* < 0, then (jl5|) /lowj^. 



(20) 



^By /,i we denote the identity matrix of size n X n, by (g) we denote the standard Kronecker 

product and by \Af'\ the cardinality of the set J\f'. 

^By * we denote the transpose of the symmetric block of the matrix. 
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Proof. From ^ we observe that S^^ >- 0, so that {S-^* -G^')'^{S^')-\S^' 
— ) ^ 0, which in turn imphes 



(21) 



If we apply the Schur complement to (j20p . we obtain: 

^G^' + {G^^f - S-^' + W^^ - (T^')T{^s^' yiT^' _ (y^)TJ,^ 

and by ^ we get (G^*)"^ UjU^ ^ [S^^ )-^T^^ + (T^^H {G^^)-^-{S^^)-^ 
< (G-^')~^W^-^'(G'^')-i, which is equivalent to 1^ if we consider W^' = 
{G^')-^W^\G^^y\ □ 

There exist in literature many optimization algorithms (see e.g. [l^ ) for solving 
distributively sparse SDP problems in the form (|19p . 

3.3. Stability of the MFC scheme 

Wc can consider the cost function of the MFC problem Vn{xtX\P^) as a 
Lyapunov function, using the standard theory for suboptimal control (see e.g. 
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19 



2C 



22 



23| for similar approaches). We also consider that at each MFC 



step the Algorithm FCDM is initialized (warm start) with the shifted sequence 
of controllers obtained at the previous step and the feedback controller k(-) 
computed in Section l3^ such that (fT5|) is satisfied and we denote it by (u'-'^)+. 
Assume also that k(-), €/(•) and a > are chosen such that, together with the 
following set 

= {2: G R" : if{x) < a}, 



satisfies ([T5|) . Then, using Theorem 3 from 



12l | we have that our MFC controller 



stabilizes asymptotically the system for all initial states x G X^, where 

Xn ^{xe M" : V^iix) <Nd + a}, 

such that V/v(x, u'^^) < Nd + a, where d > is a parameter for which we have 
i{x, u) > d for all x ^ il. Clearly, this MFC scheme is locally stable with a 
region of attraction X^. 
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3.4- Distributed implementation of the MPC scheme based on Algorithm PCDM 

In this section wc discuss some technical aspects for the distributed imple- 
mentation of the MPC scheme derived above when using Algorithm PCDM to 
solve the control problem (HH). Usually, in the linear MPC framework, the local 
stage and final cost are taken of the following quadratic form: 

tix\u') = \\x'\\l, + \\uf^^, eiix') = \\x^\\l,, 

where the matrices Q\P'^ G ]^"ixni ^j-g positive semidefinite, whilst matrices 
Ri G ^rriixnii g^j.g positive definite. We also assume that the local constraints 
sets W are polyhedral. In this particular case, the objective function in (HH), 
after eliminating the dynamics, is quadratically strongly convex, having the 
form 0: 

/(u) = 0.5 u'^Qu + {Wx + wfu, 

where Q is positive definite due to the assumption that all i?* are positive def- 
inite. Usually, for the dynamics pT|) the corresponding matrices Q and W 
obtained after eliminating the states are dense and despite the fact that Algo- 
rithm PCDM can perform parallel computations (i.e. each subsystem needs to 
solve small local problems) we need all to all communication between subsys- 
tems. However, for the dynamics (|12[) the corresponding matrices Q and W 
are sparse and in this case in our Algorithm PCDM we can perform distributed 
computations (i.e. the subsystems solve small local problems in parallel and 
they need to communicate only with their neighborhood subsystems as detailed 
below). Indeed, if the dynamics of the system are given by then 

t 

and thus the matrices Q and W have a sparse structure (see also Q). Let us 
define the neighborhood subsystems of a certain subsystem i as A/"* = A/"' U {I : 
I G J\f^ ,j G TV"*}, where TV"* = {j : i G A/'-' }, then the matrix Q has ah the 
block matrices Q'-' = for all j ^ AA* and the matrix W has all the block 
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matrices W'^ = for all j ^ A/"', for any given subsystem i. Thus, the ith block 
components of V/ can be computed using only local information: 

V,/(u) Q'^"^ + ^'^^^ + (22) 

Note that in Algorithm PCDM the only parameters that we need to compute 
are the Lipschitz constants Li. However, in the MPC problem, Li does not 
depend on the initial state x and can be computed locally by each subsystem 
as: Li — Ai„ax(Q")- From the previous discussion it follows immediately that 
the iterations of Algorithm PCDM can be performed in parallel using distributed 
computations (see (P^ ). 

Further, our Algorithm PCDM has a simpler implementation of the iterates 
than the algorithm from j^]: in Algorithm PCDM the main step consists of 
computing local projections on the sets U* (in the context of MPC usually these 
sets are simple and the projections can be computed in closed form); while in the 
algorithm from 2^ this step is replaced with solving local dense QP problems 
with the feasible set given by U* (even in the context of MPC this local QP 
problems cannot be solved in closed form and an additional QP solver needs to 
be used). Finally, the number of iterations for finding an approximate solution 
can be easily predicted in our Algorithm (see Theorems [1] and [2]) , while in the 



algorithm from 



the authors prove only asymptotic converge. 



4. Numerical Results 



Since our Algorithm PCDM has similarities with the algorithm from [23[, in this 
section we compare these two algorithms on controlling a laboratory setup with 
DMPC (4 tank process) and on MPC problems for random network systems of 
varying dimension. 

4.L Quadruple tank process 

To demonstrate the applicability of our Algorithm PCDM, we apply this 
newly developed method for solving the optimization problems arising from the 
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Figure 1: Quadruple tank process diagram. 



MPC problem for a process consisting of four interconnected water tanks, see 
Fig. [1] for the process diagram, whose objective is to control the level of water 
in each of the four tanks. For this plant, there are two types of system inputs 
that can be considered: the pump flows, when the ratios of the three way valves 
are considered fixed, or the ratios of the three way valves, whilst having fixed 
flows from the pumps. In this paper, we consider the latter option, with the 
valve ratios denoted by 7^ and 7t,, such that tanks 1 and 3 have inflows 7a and 
(1 — ^a)<la, while tanks 2 and 4 have inflows jbQb and (1 — "fb)<lb- The simplifled 
continuous nonlinear model of the plant is well known [l| . We use the following 
notation: hi are the levels and Ui are the discharge constants of tank i, S is the 
cross section of the tanks, 7^, "fb are the three-way valve ratios, both in [0, 1], 
while Qa and qt are the pump flows. 



Par am 


S 


ai 


0.2 


as 


a4 






"3 


"-4 


max 


■7° 

la 




Value 


0.02 


5.8e-5 


6.2e-5 


2e-5 


3.6e-5 


0.19 


0.13 


0.23 


0.09 


0.39 


0.58 


0.54 


Unit 


2 

m 


m? 


m2 


m2 




m 


m 


m 


m 


h 







Table 1: Quadruple tank process parameters. 



The discharge constants a^, with i = 1, . . . , 4 and the other parameters of the 

17 



model are determined experimentally from our laboratory setup (see Table [T]). 
We can obtain a linear continuous state-space model by linearizing the nonlinear 
model at an operating point given by h^, 7)^, 7^, and the maximum inflows from 
the pumps, with the deviation variables — hi — hf^u^ ~ la — 7a, ~ 7b ^ 7fc ■ 



dx 
'dt 



s / 2h 

where r,- = — a/ — ^, i = 1, . . . , 4, is the time constant for tank i. 

Using zero-order hold method with a sampling time of 5 seconds we obtain 
the discrete time model of type with the partition ^ \x^ x^]^ and 

x"^ -ir- [x^ x^]"^. For the input constraints of the MFC scheme we consider 
the practical constraints of the ratios of the three way valves for our plant, i.e 

G [0.15, 0.8] — 7o, where 7q is the linearization input. Due to the fact that 
our plant has overflow sensors fitted to the tanks and an emergency shutoff 
program, we do not introduce constraints for the states. For the stage cost we 
have taken the weighting matrices to be = 1^ and i?' = 0.01/™;. 



-f f 
-± ± 

T2 T3 










T4 
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^.2. Implementation 0/ the MFC scheme using MPI 

In this section we underline the benefits of Algorithm FCDM when it is 
implemented in an appropriate fashion for the quadruple tank MFC scheme. 
We implemented for comparison. Algorithm FCDM and that of [23|. Both 



algorithms were implemented in C programming language, with parallelization 
ensured via MFI and linear algebra operations done with CLAFACK. Algorithm 
23j requires solving, at each step, 2 QF problems in parallel, problems which 
cannot be solved in closed form. For solving these QF problems, we use the qpip 
routine of the QFC toolbox 2^. The algorithms were implemented on a FC, 
with 2 Intel Xeon E5310 CFUs at 1.60 GHz and 4Gb of RAM. For the MFC 
problem in this subsection we control the plant such that the levels and inputs 
will reach those of the steady state linearization values and 7'^. 



18 



Figure 2: Total costs for 50 MFC steps with FCDM (white) and [231 (black). 



Figure [2]outlines a comparison of the two algorithms for solving this quadru- 
ple tank MFC problem, considering a prediction time of 150 seconds, for differ- 
ent prediction horizons N and sampling time r. such that tN = 150 seconds. 

50 

The bar values represent the total sum VN{xt, u) for 50 MFC steps, where 

t=i I 
u is calculated either with FCDM or with the algorithm from [23|. For the 

same 50 simulation steps, we outline in Table [2] a comparison of the average 

number of iterations achieved by both algorithms and the performance loss, 

i.e. a percentile difference between the suboptimal cost achieved in Figure [2] 

50 

(^^VN{xt,u)) and the optimal costs that were precalculated with Matlab's 
t=i 

50 

quadprog (^^V^(a;t)), both for the time r and prediction horizon N. Note 

t=i I 
that our total cost is usually better than that of |23| when the available time 



is short (r < 2) and for r > 2 both algorithms solve the corresponding opti- 
mization problem exactly. Also note that, due to its low complexity iteration, 
our algorithm performs more than ten times the amount of iterations than the 
algorithm from 23 [. 
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PCDM 


[231 




N 


T+er / Perf TjOss (%) 


Ttpr / Pprf Loss i%\ 


0.1 


1500 


7 1 23.9 


1 / 60.18 


0.2 


750 


30 / 17.59 


2 / 36.48 


0.3 


500 


240 / 10.42 


5 / 22.1 


0.5 


300 


1803 / 7.94 


22 / 16.36 


1 


150 


12244 / 2.74 


258 / 8.44 


2 


75 


67470 / 


2495 / 


3 


50 


153850 / 


8663 / 


5 


30 


382810 / 


38110 / 



Table 2: Number of iterations and performance loss, for different times r. 

Jf..3. Implementation of the MPC scheme using Siemens S7-1200 PLC 

Due to the limitations, in both hardware and programming language, of the 
S7-1200 PLC, a proper implementation of any distributed optimization algo- 
rithm, in the sense of distributed computations and passing information between 
processes running on different cores, cannot be undertaken on it. However, to 
illustrate the fact that our PCDM algorithm is suitable for control devices with 
limited computational power and memory, we implemented it in a centralized 
manner for an MPC scheme in order to control the quadruple tank plant. We 
note that S7-1200 PLC is considered an entry-level PLC, with 50 KB of main 
memory, 2 MB of load memory (mass storage) and 2 KB of backup memory. 
There are two main function blocks for the algorithm itself, one that updates 
q{x) — Wi' + w in the quadratic objective function / given the current levels of 
the four tanks and one in which Algorithm PCDM is implemented for solving 
problem Both blocks contain Structured Control Language which corre- 
sponds to lEC 1131.3 standard. The remaining function blocks are used for 
converting the I/O for the plant to corresponding metric values. The elements 
of the problem which occupy the most memory is the Q G K2A'x2Af j^^trix of the 
objective function / and matrix W G M.^^^* for updating q{x). Both matrices 



20 



Cycle Time 


5 s 


Prediction Horizon 


10 


20 


30 


Maximum Number of Iter. 


104 


39 


15 


Used Memory (%) 


59 


72 


88 



Table 3: Available number of iterations and memory usage of Alg. PCDM 

are precomputed offline using Matlab and then stored in the work memory using 
Data Blocks. The components of the problem which require updating are the 
input trajectory vectors u' and the vector q{x) of the objective function /(u) 
which is dependent of the current state of the plant and of the current set point. 
The evolution of the tank levels and input ratios of the plant are recorded in 
Matlab on the plant's PC workstation, via an OPC server and Ethernet con- 
nection. In accordance with the imposed sample time of 5 seconds, the cycle 
time of the S7-1200 PLC is also limited to this interval. 

Due to this cycle time, the limited size of the S7-1200's work memory and 
its processing speed, the number of iterations of the Algorithm PCDM that can 
be computed are also limited. In Table [3] the number of iterations available 
per prediction horizon, included in the 5 seconds cycle time, and the memory 
requirements for these prediction horizons are presented. Although the numbers 
of computed iterations seem small, we have found in practice that the subop- 
timal MPC scheme still stabilizes the quadruple tank process and ensures set 
point tracking. The results of the control process are presented in Fig. [3] for 
a prediction horizon TV = 20: the continuous lines represent the evolution of 
water levels in each of the four tanks, while the dashed lines are their respective 
set points. We choose two set points. We first let the plant get near its first 
setpoint, after which we choose a new set point which is an equilibrium point 
for the plant. As it can be observed from the figure, the MPC scheme still steers 
the process to the respective set points. 



21 



level(cm) 




500 1000 1500 1800 
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Figure 3: Evolution of tank levels 1-4 (top), 2-3 (bottom), with continuous lines, 
against their respective set points, with dashed lines. 

4.4- Implementation of MFC scheme for random network systems 

We now wish to outline a comparison of results between algorithm PCDM 
and that of [2^ when solving QP problems arising from MFC for random net- 
work systems. Both algorithms were implemented in the same manner as de- 
scribed in Section 14.21 We considered random network systems with dynamics 
(fTTj) generated as follows: the entries of system matrices A"-^ and B^^ are taken 
from a normal distribution with zero mean and unit variance. Matrices A*^ 
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are then scaled, so that they become neutrahy stable. Matrices >r and 
)^ are random. The input variables are constrained to lie in box sets whose 
boundaries are generated randomly. The terminal cost matrices P* arc taken to 
be the solution of the SDP problem given in Lemma [5] For each subsystem the 
number of inputs is taken rrii = 5 or rrii — 10. We let the prediction horizon 
range between = 6 to = 120. The subsystems are arranged in a ring, i.e. 
A/"* = {i — + 1}. We first considered M = 8 subsystems, matching the 
number of cores on our PC. Parallel implementation was also carried out for 
M — 16 subsystems, with each core of the PC running two processes. The re- 
sulting random QP problems have p = MNnii variables. The stopping criterion 
for each algorithm is /(u^) — /* < 0.001, with /* being precomputcd for each 
problem using Matlab's quadprog. For each prediction horizon, 10 simulations 
were run, starting from different random initial states. 
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M 
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CPU (s) 


Iter 


CPU (s) 


Iter 


CPU (s) 


CPU (s) 


8 


480 


0.47 


1396 


1.904 


682 


0.663 


1.08 




960 


2.21 


2839 


21.52 


1475 


9.15 


3.57 




3200 


256.4 


8671 


911.2 


4197 


265.3 


39.8 




4800 


857.2 


12750 


7864.4 


6182 


1114 


139.9 




9600 


2223.1 


16950 




* 


3125 


307.5 


16 


480 


4.36 


2600 


4.66 


1615 


0.99 


0.97 




960 


15.02 


4792 


25.18 


2798 


14.17 


3.21 




3200 


377.6 


13966 


612.8 


8462 


423.1 


41.3 




4800 


1524.7 


23539 


3061.7 


14241 


2161.1 


134.03 




9600 


3415.1 


29057 


* 




4773 


308.4 



Table 4: CPU time in seconds and nr. of iterations for alg. PCDM and 
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Table |4] presents the average CPU time in seconds for the execution of each 
algorithm. It illustrates that Algorithm PCDM, with its design for distributed 
computations and simple iterations, usually performs better than that in 
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where the assumption is that for each iteration, a QP problem of size p/M needs 
to be solved. The entries with * denote that the algorithm would have taken over 
5 hours to complete. Also note that our implementation of the algorithm from 



23| , for problems of larger dimensions, i.e starting with p = 3200, takes less time 



for it to complete if the problem is divided between M ~ 16 subsystems than 
M = 8. This is due to the fact that the solver qpip takes much more time to solve 
problems of size 600 in the case oip = 4800 and M = 8 than problems of size 300 
for p = 4800 and M = 16. Also, the transmission delays between subsystems 
are negligible in comparison with these qpip times. We have also implemented 
Algorithm PCDM in a centralized manner, i.e. without using MPI and, as can 
be seen from the tabic, wc gain spcedups of computation when the algorithm 
is parallelized. Algorithm PCDM is outperformed by Matlab's quadprog, but 
do note that quadprog is not designed for distributed implementation and there 
are no transmission delays between processes. 



5. Conclusions 

In this paper wc have proposed a parallel optimization algorithm for solving 
smooth convex problems with separable constraints that may arise e.g in MPC 
for general linear systems comprised of interconnected subsystems. The new 
optimization algorithm is based on the block coordinate descent framework but 
with very simple iteration complexity and using local information. We have 
shown that for strongly convex objective functions it has linear convergence rate. 
An MPC scheme based on this optimization algorithm was derived, for which 
every subsystem in the network can compute feasible and stabilizing control 
inputs using distributed computations. An analysis for obtaining local terminal 
costs from a distributed viewpoint was made which guarantees stability of the 
closed-loop interconnected system. Preliminary numerical tests show that this 
algorithm is suitable for MPC applications, especially those with hardware that 
has low computational power. 
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